移动通知系统在各种应用程序中起着重要作用,以通信,向用户发送警报和提醒,以告知他们有关新闻,事件或消息的信息。在本文中,我们将近实时的通知决策问题制定为马尔可夫决策过程,在该过程中,我们对奖励中的多个目标进行了优化。我们提出了一个端到端的离线增强学习框架,以优化顺序通知决策。我们使用基于保守的Q学习的双重Q网络方法来应对离线学习的挑战,从而减轻了分配转移问题和Q值高估。我们说明了完全部署的系统,并通过离线和在线实验证明了拟议方法的性能和好处。
translated by 谷歌翻译
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addressing this problem, but they still cost a lot in terms of running time and memory usage. In this paper, to further solve this problem efficiently, we design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP utilizes the compact seq-array to store the necessary information in a sequence database. The seqPro structure is designed to efficiently calculate candidate patterns' utilities and upper bound values. Furthermore, a new upper bound on utility, namely tighter reduced sequence utility (TRSU) and two pruning strategies in search space, are utilized to improve the mining performance of HUSP-SP. Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
translated by 谷歌翻译
The discovery of utility-driven patterns is a useful and difficult research topic. It can extract significant and interesting information from specific and varied databases, increasing the value of the services provided. In practice, the measure of utility is often used to demonstrate the importance, profit, or risk of an object or a pattern. In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to the neglect of utility sharing. This leads to the derived patterns only exploring partial and local knowledge from a database. Utility occupancy is a recently proposed model that considers the problem of mining with high utility but low occupancy. However, existing studies are concentrated on itemsets that do not reveal the temporal relationship of object occurrences. Therefore, this paper towards sequence utility maximization. We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining (HUOSPM). Three dimensions, including frequency, utility, and occupancy, are comprehensively evaluated in HUOSPM. An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed. Furthermore, two data structures for storing related information about a pattern, Utility-Occupancy-List-Chain (UOL-Chain) and Utility-Occupancy-Table (UO-Table) with six associated upper bounds, are designed to improve efficiency. Empirical experiments are carried out to evaluate the novel algorithm's efficiency and effectiveness. The influence of different upper bounds and pruning strategies is analyzed and discussed. The comprehensive results suggest that the work of our algorithm is intelligent and effective.
translated by 谷歌翻译
Nowadays, with the rapid development of the Internet, the era of big data has come. The Internet generates huge amounts of data every day. However, extracting meaningful information from massive data is like looking for a needle in a haystack. Data mining techniques can provide various feasible methods to solve this problem. At present, many sequential rule mining (SRM) algorithms are presented to find sequential rules in databases with sequential characteristics. These rules help people extract a lot of meaningful information from massive amounts of data. How can we achieve compression of mined results and reduce data size to save storage space and transmission time? Until now, there has been little research on the compression of SRM. In this paper, combined with the Minimum Description Length (MDL) principle and under the two metrics (support and confidence), we introduce the problem of compression of SRM and also propose a solution named ComSR for MDL-based compressing of sequential rules based on the designed sequential rule coding scheme. To our knowledge, we are the first to use sequential rules to encode an entire database. A heuristic method is proposed to find a set of compact and meaningful sequential rules as much as possible. ComSR has two trade-off algorithms, ComSR_non and ComSR_ful, based on whether the database can be completely compressed. Experiments done on a real dataset with different thresholds show that a set of compact and meaningful sequential rules can be found. This shows that the proposed method works.
translated by 谷歌翻译
In terms of artificial intelligence, there are several security and privacy deficiencies in the traditional centralized training methods of machine learning models by a server. To address this limitation, federated learning (FL) has been proposed and is known for breaking down ``data silos" and protecting the privacy of users. However, FL has not yet gained popularity in the industry, mainly due to its security, privacy, and high cost of communication. For the purpose of advancing the research in this field, building a robust FL system, and realizing the wide application of FL, this paper sorts out the possible attacks and corresponding defenses of the current FL system systematically. Firstly, this paper briefly introduces the basic workflow of FL and related knowledge of attacks and defenses. It reviews a great deal of research about privacy theft and malicious attacks that have been studied in recent years. Most importantly, in view of the current three classification criteria, namely the three stages of machine learning, the three different roles in federated learning, and the CIA (Confidentiality, Integrity, and Availability) guidelines on privacy protection, we divide attack approaches into two categories according to the training stage and the prediction stage in machine learning. Furthermore, we also identify the CIA property violated for each attack method and potential attack role. Various defense mechanisms are then analyzed separately from the level of privacy and security. Finally, we summarize the possible challenges in the application of FL from the aspect of attacks and defenses and discuss the future development direction of FL systems. In this way, the designed FL system has the ability to resist different attacks and is more secure and stable.
translated by 谷歌翻译
对比模式挖掘(CPM)是数据挖掘的重要且流行的子场。传统的顺序模式无法描述不同类别数据之间的对比度信息,而涉及对比概念的对比模式可以描述不同对比条件下数据集之间的显着差异。根据该领域发表的论文数量,我们发现研究人员对CPM的兴趣仍然活跃。由于CPM有许多研究问题和研究方法。该领域的新研究人员很难在短时间内了解该领域的一般状况。因此,本文的目的是为对比模式挖掘的研究方向提供最新的全面概述。首先,我们对CPM提出了深入的理解,包括评估歧视能力的基本概念,类型,采矿策略和指标。然后,我们根据CPM方法根据其特征分类为基于边界的算法,基于树的算法,基于进化模糊的系统算法,基于决策树的算法和其他算法。此外,我们列出了这些方法的经典算法,并讨论它们的优势和缺点。提出了CPM中的高级主题。最后,我们通过讨论该领域的挑战和机遇来结束调查。
translated by 谷歌翻译
高效用顺序模式采矿(HUSPM)是具有许多真实世界应用的知识发现和数据分析中的重要活动。在某些情况下,HUSPM无法提供出色的措施来预测会发生什么。高效用顺序规则挖掘(HUSRM)发现了高实用性和高置信顺序规则,从而使其可以解决HUSPM中的问题。所有现有的HUSRM算法旨在找到与现实不一致的,可能会产生假的HUSRS的高级序列顺序规则(HUSRS)。因此,在本文中,我们制定了高公用事业完全订购的顺序规则挖掘的问题,并提出了两种称为petalsr和totalsr+的新型算法,旨在识别所有高实用性完全订购的顺序规则(HTSRS)。 TotalSR创建了一个实用表,该表可以有效地计算前提支持和一个效用前缀总和列表,该列表可以计算序列中O(1)时间中的剩余实用程序。我们还引入了左侧的扩展策略,该策略可以利用反单调性属性来使用信心修剪策略。 TotalSr还可以在实用程序上限的修剪策略的帮助下大大减少搜索空间,从而避免更加有意义的计算。此外,TotalSr+使用辅助前期记录表来更有效地发现HTSR。最后,在真实和合成数据集上都有许多实验结果,表明topalsR比较少的修剪策略的算法要高得多,并且在运行时间和可伸缩性方面,topalsr+效率更高。
translated by 谷歌翻译
域的概括(DG)旨在在几个源域上学习一个模型,希望该模型能够很好地推广到看不见的目标域。域之间的分布移位包含协变量和条件偏移,模型都必须能够处理以获得更好的推广性。在本文中,提出了一种新颖的DG方法来处理通过视觉对齐和不确定性引导信仰集合(VAUE)的分布转移。具体而言,对于协变性移位,视觉对齐模块的设计旨在使图像样式的分布与常见的经验高斯分布对齐,以便可以在视觉空间中消除协变量移位。对于有条件的转变,我们基于主观逻辑和Dempster-Shafer理论采用了不确定性引导的信念集成策略。给定测试样品的条件分布是通过源域的动态组合估计的。进行了全面的实验,以证明在四个广泛使用的数据集上,即办公室,VLCS,TerrainCognita和PACS上提出的方法的出色性能。
translated by 谷歌翻译
时间动作定位(TAL)旨在预测未修剪视频(即开始和结束时间)中动作实例的动作类别和时间边界。通常在大多数现有作品中都采用了完全监督的解决方案,并被证明是有效的。这些解决方案中的实际瓶颈之一是所需的大量标记培训数据。为了降低昂贵的人类标签成本,本文着重于很少调查但实用的任务,称为半监督TAL,并提出了一种有效的主动学习方法,名为Al-Stal。我们利用四个步骤来积极选择具有很高信息性的视频样本,并培训本地化模型,名为\ emph {火车,查询,注释,附加}。考虑定位模型的不确定性的两个评分函数配备了ALSTAL,从而促进了视频样本等级和选择。一个人将预测标签分布的熵作为不确定性的度量,称为时间提案熵(TPE)。另一个引入了基于相邻行动建议之间的共同信息的新指标,并评估视频样本的信息性,称为时间上下文不一致(TCI)。为了验证拟议方法的有效性,我们在两个基准数据集Thumos'14和ActivityNet 1.3上进行了广泛的实验。实验结果表明,与完全监督的学习相比,AL-Stal的表现优于现有竞争对手,并实现令人满意的表现。
translated by 谷歌翻译
在小组活动识别中,层次结构框架被广泛采用以表示个人及其相应小组之间的关系,并实现了有希望的绩效。但是,现有方法在此框架中仅采用了最大/平均池,这忽略了不同个体对小组活动识别的不同贡献。在本文中,我们提出了一种新的上下文合并方案,名为Ascentive Pooling,该方案可以从个人动作到小组活动的加权信息过渡。通过利用注意机制,细心的合并是可解释的,并且能够将成员环境嵌入现有的层次模型中。为了验证拟议方案的有效性,设计了两种特定的专注合并方法,即全球细心合并(GAP)和分层的细心池(HAP)。差距奖励对小组活动意义重大的个体,而HAP通过引入亚组结构进一步考虑了层次结构。基准数据集上的实验结果表明,我们的建议在基线之外取得了显着优势,并且与最先进的方法相当。
translated by 谷歌翻译